Skip to content

Conversation

@gebsh
Copy link

@gebsh gebsh commented Nov 1, 2025

This adjusts the RegExp pattern used to trim text while formatting so that it only matches the four characters that are considered whitespace in the XML standard.

I added various invisible characters to the fixture to verify the fix. Please let me know if having them there in an unescaped form is acceptable because they might trigger warnings in some code editors. (Unfortunately, I don't think there's any way to escape them in such a way that the bug is covered by tests.) The full list of non-standard invisible characters that the fixture now contains is:

Fixes #789.

gebsh added 2 commits November 1, 2025 14:49
The `\s` character class in RegExp includes several other characters
such as `\v`, `\f`, or various space variants. However, in XML only 4
characters are considered whitespace: ` ` (space), `\t`, `\n`, `\r`.

This commit adjusts the pattern so that only these four are matched. It
also adds whitespace characters to the formatting fixture to verify that
they are formatted properly. The first two paragraphs contain tab, space,
and newline characters. The carriage return character was omitted, since
it's quite rare and most text editors convert it to `\n` when saving a
file, so it'd likely be replaced once the fixture is edited again. The
third paragraph contains all other whitespace characters that the `\s`
character class includes.
These are not valid XML characters so it doesn't make sense to test them.
@gebsh gebsh changed the title Whitespace Only treat space, \t, \n, and \r as whitespace Nov 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Whitespace is incorrectly preserved when xmlWhitespaceSensitivity is set to ignore

1 participant